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1 Properties of the Epistatic Immunity Set 

1.1 Size 

As discussed in the main text, in our model the immunity set I n (v) is defined 
as the number of strings that do not contain two adjacent mutations with respect 
to a given string v (we consider "adjacent" also the first and the last bit of every 
string), 

I n (v) = { z€ H n : Zi^vi^ z\i+i\„ = Vi }. (1) 

Since cross- immunity, as defined in ([TJ, is only sensible to differences between 
strains, I n (v) is invariant under translation on the hypercube with periodic 
boundary conditions. In order to see this, we can define a sort of translational 
operator acting on the space of sequences, 

*<?(•) = c y . , ( 2 ) 

where c is a generic strain and "M" is the XOR operator. Both the El-sets and 
the hamming distance are invariant to it: 

V2 e I n {v\) x 5 {v 2 ) e I n (Xe{vi) ), (3) 

d{vi ,v 2 )=d( X 5 {vx) , X s (v 2 ) ) Vc. (4) 

Therefore I n (v) = X$(I n (0)). Then, in order to characterize the static properties 
of I n (v), we can take the null vector as generating strain for the immunity set 
without loss of generality. Let us now compute the cardinality of the immunity 



set. In order to do this, we first compute the cardinality of the EI set "without 
boundary conditions" (EISNB), defined as: 

In B (v) = i * € H n Zi^Vi^ z i+ i=v i+l Vi € 0, 1, ...,n - I }, (5) 

and then we show that the cardinality of is a linear combination of cardi- 
nalities of EISNB sets. 

Lemma 1.1. Let us call B(n) the cardinality of the EISNB. This number 
follows the recursive law, 

B(n) = B{n- 1) + B(n - 2), 
with initial conditions 5(0) = 1 and B(l) = 2. 
Proof. 

. /f fl (0) = {(0), (1)}=>£(1) = 2. 

• 1^(0) = {(0, 0), (1, 0), (0, 1)} =* B(2) = 3. 

• Let's consider the set I^ B (0) for a generic dimension n. This set is equiva- 
lent to the union of the two disjoint sets C n and D n , which are respectively 
defined as the set of all strings belonging to I^ B (0) with the first bit equal 
to 1 and as the set of all strings belonging to I^ B (0) with the first bit 
equal to 0: 

C n = {veH n s.t. v = (1,0, c), ce J n _ 2 (0)}, (6) 

D n = {veH n s.t. v = (0,5), ce/„_i(0)}. (7) 

In equation ([6| we have considered that, by definition of EISNB, if the 
first bit is mutated, the second one cannot be mutated. It's easy to see 
that the cardinality of the first set is equal to B(n — 2) and that the 
cardinality of the second one is equal to B(n — 1). Being the two sets 
disjoint, B(n) = B(n - 1) + B(n - 2). 

□ 

Thus, the cardinality of the EISNB follows the well known Fibonacci rule. We 
will use this result to determine the cardinality of the Epistatic Immunity Set. 

Lemma 1.2. Let us call S(n) the cardinality of the Epistatic Immunity Set. 
This number follows the following recursive law, 

S{n) = S(n - 1) + S{n - 2), 

with initial conditions 5(2) = 3 and 5(3) = 4. 
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Proof. As for the previous lemma, the proof is by induction. 

. 7 2 (0) = {(0,0),(1,0), (01)} 5(2) = 3; 

. 7 3 (0) = {(0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1)} => S(2) = 4; 

• We observe that the following relation between the EI set and the EISNB 
holds: 

i n (o)=i» B (o) \ {zei^ B (6) : z = (i,o, a; 0,1), ?e/«_ 4 (o)}. (8) 

Then, 

S(n) = B(ri)-B(n-4). (9) 
Applying Lemma [L~T| to equation ([9| one gets: 

B(n) - B(n - 4) = B(n - 1) + B(n - 2) - B(n - 5) - B(n - 6). (10) 

Then, collecting the terms properly and using again equation ([9]), we fi- 
nally find the desired relation: 

S(n) = S(n - 1) + S(n - 2). 

□ 

The sequence S(n) = S(n — 1) + S(n — 2), under the initial condition specified 
by the previous lemma, is called Lucas sequence. As for the Fibonacci sequence, 
the fraction of two consecutive numbers of the Lucas sequence converges asymp- 
totically to the value $ = ■> which is we h known as the Golden Ratio. In 
particular it is easy to show that S{n) = $ n + (1 - $)" ~ $™. 

1.2 Density 

We define the Epistatic density function as the ratio between the number of 
strings, L(n, i), contained in I n (0) and having hamming distance i from and 
the number V(n, i) of strings having hamming distance i from 0: 

L(n,i) 

p n {l) ■= 7F7 ^- 11 

V(n, i) 

This function gives an idea of how the elements of the epistatic immunity set 
are distributed on the hypercube. It is easy to check that L(n,i) — ) — 

("7-2 The first term represents the number of strings not containing two 
adjacent ones; the second term represents the number of strings not containing 
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Figure 1: Epistatic density function computed numerically directly from definition ( f f I 
for n = fO, 30, 50, 100. The densities are plotted as function of j(n) = i/^Jn, 
where i is the length. As n increases, the epistatic density function converges 
to the well defined function p x (j) = exp(—j 2 ). 



any pairs of adjacent ones, but with the first and the last bit both equal to one. 
Thus the second term takes into account the effect of the periodic condition in 
definition |TJ). The denominator is simply V(n, i) = ("). The density function 
(11), computed numerically, is represented for different values of n and plotted 
as function of ij\fn in Fig. [l] We see that p n {i) can be approximated as 



Pn(i) - exp( ), 

n 

and, substituting j = i/*/n, we get 

p n (j) ~ exp(-j 2 ). 



(12) 



(13) 



Therefore, the epistatic immunity set covers an area of the hypercube whose 
size grows proportionally to ^/n. In this area, the density of strings satisfies 



equation (12 1. We now prove analytically validity of approximation (12) in the 
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range i < ^/ri. 

(n-t)!-(n-i+l)l M i ■ (i - 1) 



ra! ■ (n — 2i + 1)! (n — i + 1) • (n — «) 

(n-i)-(rc-i-l)...(rc-2i + 2) 2 

= ? 7^ } • — ^ [l + Oiin) 

n-{n-l)...{n-i + 2) L (14) 

= (1- 1 )'(1-^ T ) ■■• (1 ^— sr) • (1 + O^/nf) 

n n — 1 n — (i — 2) 

= ( 1 - i/n + 0(i/«) 2 • ( 1 + 0(i/^) 2 )■ 

Using the expansion: 

'4 — 1 



d+^r^Ei 7 K (15) 

1=0 

one gets: 



1=0 



f i-\ 



£(' /IfO^/nj'-i/n]' .(l + O^/n) 2 ) 
.1=0 

£ (' 7 1 ) [-(</»)' + 0(*/n)< +1 + 0{i/n) l+2 . . . 0(z/n) 2 '' 



(l + 0(Vn) 2 ) 



n n 



2!n 2 



0{- 



1 +0(^)} +...}■{! + 0^/nf 



3!n 3 



(16) 



Then, substituting j = ijyfn and considering j < 1, we get equation (13): 



p„(j) = 0(^=) + l-.7 2 + 



2! 



3! 



(17) 



exp(-j 2 ) + 0(^). 



2 Numerical estimate of m(n) 

In the main text we introduced the quantities m(n) and M(n) in order to 
investigate how the introduction of a correlated rule for cross-immunity shapes 
the EIS. m(n) is defined as the minimum number of strings needed to cover with 
their immunity sets the whole sequences space; M(n) is the maximum number of 
distinct strings that can be accommodated in the space of sequences still leaving 
some strings out of their EIS. Essentially, in the two cases the set of strings 
that realizes the minimum (maximum) will be such to minimize (maximize) 
the overlap among immunity sets. In the main text we have computed M(n) 
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Figure 2: Top. Cost function averaged over several realizations for the same temper- 
ature T(t) and for k = 14, 15, 16 and n — 10 (dots); minimum value of the 
cost function obtained until time t for k = 16 and n — 10 (continuous line). 
Bottom. The acceptance probability P(t), which is equal to the fraction 
of solutions accepted at temperature T(t). Inset. The thermal function 
adopted for these simulations, T(t) = To • a*, with a ~ 0.982 and To = 15. 

analytically and we have provided analytical and numerical estimates for m(n). 
In order to compute the numerical estimate for m(n) we adopted a Simulated 
Annealing approach pQ. 

We first notice that an exhaustive search of the solution would not be pos- 
sible because the space of the infection sets is too big: for instance, the num- 
ber of possible infection sets with cardinality 16 in a space of dimension 10 is 

(fe) = ( 16 ) ~ 10 35 - We instead proceeded as follows: we fix a value k for 
the number of elements of the infection set A and we search for a configuration 
which minimizes the cost function E n ^(Ak) — 2™ — |7„(Afe)| (we denominate 
Ak any infection set with cardinality k). Of course, E n ^(Ak) > for all k: the 
smallest k for which we obtain E n ^(Ak) = for some set Ak is the numerical 
estimate for m(n), mjv(«), and the relative set is one of the possible A m i n . For 
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fixed values of n and k we start from a random choice of the infection set A k and 
we modify a randomly selected bit of a randomly selected string in Ak- Then 
we compute the new cost function E n ^(A' k ) and 

• if E nt k{A' k ) < E„ : k{Ak), then Ak is replaced by A' k with probability 1; 

• if E n ^{A' k ) > E n ,k(Ak), then Ak is replaced by A' k with probability 
P(AE) = exp(-A£/T(i)). 

We iterate this procedure R(t) times for every value of the temperature T(t). 
Afterwards, temperature is updated with the rule T(t + 1) = a ■ T(t), where 
< a < 1. The number of iterations performed for every temperature value, 
R(t), is chosen to grow exponentially with t. In fact, recalling the analogy with 
Statistical Mechanics [T], lower is the temperature, larger is the time a body that 
can exchange heat with a thermal bath needs to reach the thermal equilibrium. 
In Fig. [2] (top) we report the behaviour of the cost function, averaged over all 
the iterations performed at the same temperature T(t), as function of t. In 
Fig. [2] (bottom) we report the fraction of accepted solutions at time t as well as 
the function T(t) considered. When the average cost function and the minimum 
cost function stop decreasing and remain constant, a local minimum is reached. 
The results obtained for the cardinality of the generating set are reported in 
Fig. [3] for values of n up to n — 20. We also report the upper and lower 
bounds for m(n). Due to the high computational complexity of the problem 
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(the number of local minima of the cost function and the size of the solutions 
space both grow very fast with n and fc), our numerical estimate of m(n) tends 
to be rougher for higher values of n. 

3 Cluster structure of the EIS 

In the main text we have studied the cluster structure of the EIS. Let us 
first recall that the immunity set, I n (v), of a single string v is a connected set. 
Without loss of generality we consider v — since the I n (v) is invariant under 
translation on the hypercube with periodic boundary conditions, through the 
translation operator (12|. We can think 7 n (0) as the union of the disjoint sets, 
J„(0) =Ug =0 JVi, with 

N i = {z£l n (6):d h (6,z*)=i}. (18) 

For each z t e iVj there is always a nearest neighbor contained in iV,_i such 
that dh(zi,Zi-i) = 1. This implies that, for each string in I n (0), it always 
exists a sequence of nearest neighbors to connect that string to 0, i.e. I n (v) is 
a connected set. 

Starting from this result, in the main text we have shown that the EIS is always 
connected, though not simply connected. In fact, when k strings are drawn at 
random, there exists a threshold for k — \n/2~\ above which the complementary 
EIS (CEIS) can be broken down in clusters. This is due to the fact that we 
need to choose at least \n/2] strings in order to generate an EIS that contains 
"holes" , as it is shown in the sketch in Figure [4] (bottom) . An example is given 
by the following infection set: 

(1, 1,0, 0, 0, 0) 
(0, 0, 1, 1,0, 0) 
(0, 0, 0, 0, 1, 1) 

The string (0, 0, 0, 0, 0, 0) is not contained into the EIS generated by this set, 
on the contrary of all its neighbours. Therefore, the string alone constitute a 
cluster of the CEIS. However, other infection sets can generate CEIS featuring 
a much more complex cluster structure. 

Figure [4] (top) shows the average number of connected clusters in the CEIS 
(divided by the maximum value for each n) as a function of the rescaled variable: 

k'(n) = — 1 , (19) 

where the exponent 7 ~ 0.12 estimates the finite size scale effects and 2~ nn is 
the fraction of strings contained into the immunity set of a single strain with 
r\ = 1 — ki2 4> ~ 0.306. Therefore the normalized number of connected clusters 
in the CEIS can be rescaled on a single master curve as n increases, thanks to 
a suitable rescaling of k. 
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re 4: Top Average number of connected clusters, Fis,k(j), composing the CEIS as 
a function of fc'(n) as defined in the text. For each value of n the functions 
plotted have been divided by their maximum value. The functions collapse 
into a well defined function, independent of n, as n increases. Inset. Distri- 
butions Ji5,fc(j) for several values of k. For k < 90 the functions feature two 
disjoint peaks with the rightmost peak, whose area is equal to 1, moves to the 
left as k increases. For k > 90 the two peaks merge. Bottom Sketch of the 
topology of the immunized (green) and non-immunized (blue) region of the 
sequence space. Left: the set CEIS is composed by one big connected cluster, 
corresponding to a non-immunized region of the hypercube, and many small 
connected clusters, corresponding to "holes" contained into the EIS. Right: 
the whole hypercube is immunized apart from a few non-immunized clusters. 
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A more detailed description of the cluster structure of the CEIS can be 
obtained by investigating the dependence on n and k of the average size of 
the connected components composing it. To this end we define the distribu- 
tion functions F n ^ k (j) as follows. F n _ k {j) are defined as the average number 
of connected clusters, with cardinality j, generated by an infection set with k 
randomly drawn strains. In Fig [4] (inset of the top panel) we report the distri- 
butions for n — 15 and several values of k. It turns out that, for values of k 
not too large, the complementary set is composed by one big connected cluster 
and many small connected clusters. In fact the distributions exhibit two dis- 
joint peaks: one centered on a large value of j, due to the contribution of the 
big cluster, and one centered on small values of j, given by the contribution of 
the small clusters. Further analysis reveals that the area of the former peak is 
always equal to 1 and that for every choice of the infection set there is always 
only one big cluster. On the other hand, the number of the small clusters is 
not fixed and depends on the infection set. On the contrary, for larger values 
of k, only the small connected clusters remain: increasing k the rightmost peak 
disappears, moving on the left and merging with the leftmost peak. As sketched 
in Fig. [4] (bottom) , it is reasonable to interpret the big cluster as a region of the 
hypercube which has not been immunized yet and the small connected clusters 
of the CEIS as holes contained into the immunized region of the hypercube. 
This cluster structure of the CEIS could have a strong impact on the underly- 
ing virus-host interaction, which could be investigated through a more realistic 
simulation of the virus-host dynamics. 
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